This is a tutorial for using my currently unamed package of Seurat/DESeq auxiliary functions in RNAseqPlotR.R
RNAseqPlotR.R is a colorblindness-friendly package of functions built for analysis and visualization of rnaseq data. It includes various helper and plotting functions for working with RNAseq data. I parsonally use the helper functions (especially meta() and get.metas()) constantly, but I imagine that the main draw for others will be the plotting functions.
All plotting functions spit out default-themed plots upon minimal coding input for daily analysis needs, but they also allow various manipulations to provide for out-of-the-box submission-quality figures as well.
I built the functions while analyzing single cell RNAseq data with Seurat, but since then, I have started to add functionality for handling bulk RNAseq data as well. Currently, the bulk capabilities only extend to DESeq-analyzed bulk data, but I plan to add functionality for edgeR in the near future.
NOTE: These functions are currently in a functional though not yet prime-time state. My goal is that they will eventually be released as part of a package that will have all the typical function documentation. Currently, documentation is in the form of header comments in the function code. Hopefully, with those and with this tutorial, all the workings of these functions can be figured out! If you have questions about how to do something, or would like to suggest new features, my eamil is daniel.bunis@ucsf.edu.
DBDimPlot = handles all needs for Seurat TSNEPlot / PCAPlot / DimPlot functions. Improves on the Seurat functions’ capabilities to present continuous (including negative) numerical data, or descrete data (clustering, samples, batches, condition, etc.) in various ways.
DBPlot = handles needs of Seurat’s VlnPlot function. Allows generation of jitter/dot-plot, boxplot, and/or violin-plot representation of numerical data, with order of what’s on top easily settable. Data can be expression of particular genes or any numerical metadata like percent.mito, nUMI, and nGene. Colors and grouping of cells is tunable through discrete inputs.
DBBarPlot = No analogous function currently in Seurat. Handles plotting of discrete data on a per-sample or per-condition grouping. Essentially, it is similar to DBPlot, but for discrete variables. Example: cluster makeup of a sample.
Loading in the functions…
source("~/Desktop/RNAseqPlotR.R")
Now, load in your Seurat dataset, and you’ll be ready to get going! If you are working with bulk RNAseq data, there are a few extra steps. I will eventually build an entire tutorial for this type of data.
#Load in Seurat object used for making all example difures in this vignette:
HSPCs <- readRDS("~/Box Sync/Layering Analysis/10X/Savespot1901/HSPCs_cca-alinged.rds")
The basic use of most functions, including all of the plotting functions is funtion(var, object, other.inputs) where var is the target variable (often this will be the “name” of a metadata or gene) and object is the Seurat-object target. One of the most useful other inputs is probably cells.use.
var - When var is given as a string name of a gene or metadata, default plot titles are generally generated and the functions will automatically grab the relevant values to plot. A discrete vector can also be provided, but note that even if cells.use is going to be used to subset down to only certain cells, this vector must include data for all the cells.
object & DEFAULT - Object can be given in 3 ways:
Using the 3rd method allows for quickest use of the plotting functions. For example, a t-SNE plot with cells, colored by their age, can then be generated with just DBDimPlot(“age”) rather than with DBDimPlot(“age”,“HSPCs”), as shown below.
Other Required Inputs, group.by and color.by - DBBarPlot and DBPlot have 1 and 2 other required inputs. For both, group.by is required in order to set the x axis groupings. For DBPlot, color.by is required as well for setting the fill color of the violin-plotting or box-plotting. The inputs for each of these variables are the “quoted” name of a meta.data of the object. These should be whatever metadata you wish to have the cells grouped by / colored by.
DEFAULT <- "HSPCs" #After setting this, the object slot can be left out entirely!
#DBDimPlot
DBDimPlot("age")
#DBPlot
DBPlot("percent.mito", group.by = "Sample", color.by = "age")
#DBBarPlot
DBBarPlot("new.HSPCcelltype", group.by = "Sample")
“ident” = If “ident” is provided as the var then the functions know to grab object@ident.
“gene-name” or “meta-name” = If a character string is provided that is not “ident”, then helper functions is.meta() and is.gene() will be called to determine how to proceed. If the “quoted” name of a metadata slot is given as var, then object@metadata$var will be used and the plot will be titled “var” by default. If the “quoted” name of a gene is given as var, then object@data[gene,] will be used and the plot will be titled “Expression of var” by default.
DBDimPlot("ident", sub = "What happens when \"ident\" is provided")
DBDimPlot("age", sub = "What happens when a \"meta-name\" is provided")
DBDimPlot("CD34", sub = "What happens when a \"gene-name\" is provided")
A couple examples before jumping in, to showcase the flexible functionality:
DBDimPlot("new.HSPCcelltype",
main = "Easily plot where cells lie in PC/tsne/cca-space, and set all labels",
sub = "Easily set titles and all labels too!",
xlab = "PC1 (50%) (actually tSNE1, but this is an example!)",
ylab = "PC2 (20%)",
legend.title = "Celltype",
do.label = T,
ellipse = T,
cells.use = meta("new.HSPCcelltype")!="0", colors = c(2:8))
DBPlot("Score",
group.by = "Sample",
color.by = "age",
plots = c("vlnplot","jitter","boxplot"),
cells.use = meta("new.HSPCcelltype")=="GMP",
labels = c("Adult-1", "Adult-2", "Fetal-1", "Fetal-2", "Fetal-3", "Cord-1", "Cord-2"),
jitter.size = 0.5,
boxplot.color = "white",
boxplot.fill = F,
y.breaks = c(-70, -35, 0, 35, 70),
hline = c(12, -20),
jitter.width = 0.35,
sub = "MUCH MORE FLEXIBLE then Seurat's native DimPlot and VlnPlot"
)
For more, see RNAseqPlotR_Vignette2.html